Skip to content

LCORE-1037: update BYOK and RAG guides to use lightspeed-stack config#1842

Merged
are-ces merged 1 commit into
lightspeed-core:mainfrom
are-ces:lcore-1037-update-byok-guide-stack-config
Jun 3, 2026
Merged

LCORE-1037: update BYOK and RAG guides to use lightspeed-stack config#1842
are-ces merged 1 commit into
lightspeed-core:mainfrom
are-ces:lcore-1037-update-byok-guide-stack-config

Conversation

@are-ces

@are-ces are-ces commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Rewrite BYOK and RAG guides to reference lightspeed-stack.yaml configuration instead of Llama Stack run.yaml
  • Remove all run.yaml / Llama Stack-specific references from both guides
  • Address review nits: fix pdmuv, remove stale OpenAI note, correct Ollama tool calling description, fix incorrect embedding dimensions

Type of change

  • Documentation Update

Tools used to create PR

  • Assisted-by: Claude Opus 4.6
  • Generated by: Claude Opus 4.6

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Documentation-only change. Verified YAML examples are consistent with current config model structure.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation

    • Updated the BYOK setup guide to use the uv run command for running embedding model downloads.
    • Clarified Ollama inference provider limitations regarding tool calling support in the RAG documentation.
    • Removed unnecessary LLM configuration prerequisite instructions from the BYOK guide.
  • Chores

    • Updated RAG vector store embedding dimension values in example deployment configurations.

- Replace pdm with uv for embedding model download command
- Remove unnecessary OpenAI API key note
- Fix Ollama section: clarify no tool RAG but inline RAG supported
- Remove empty References section from RAG guide
- Fix incorrect embedding dimensions for all-mpnet-base-v2 (768, not 1024/384)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

This PR updates documentation and configuration for BYOK and RAG setup. It corrects the embedding model download tool from pdm to uv, removes outdated LLM configuration guidance, clarifies Ollama provider capabilities around tool-calling support, and updates example configuration to standardize embedding dimensions to 768.

Changes

Documentation and Configuration Updates

Layer / File(s) Summary
BYOK guide tool and configuration updates
docs/byok_guide.md
Embedding model download command uses uv run python instead of pdm run python. Removes note about configuring LLM inference provider (e.g., OPENAI_API_KEY) in run.yaml.
RAG guide Ollama provider clarification
docs/rag_guide.md
Ollama remote::ollama provider documentation explicitly states tool-calling is unsupported (RAG-as-tool unavailable), while clarifying inline RAG remains supported. Removes caveats about OpenAI-compatible endpoints and experimental tool-calling patches. Extra formatting separator removed.
Example YAML embedding dimension configuration
examples/lightspeed-stack-byok-okp-rag.yaml
BYOK RAG vector store configurations for ocp-docs and knowledge-base updated to use consistent embedding_dimension: 768.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

Suggested reviewers

  • tisnik
  • syedriko
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly references the main objective: updating BYOK and RAG guides to use lightspeed-stack config, which aligns with all three changed files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@are-ces are-ces requested a review from syedriko June 3, 2026 09:03

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/rag_guide.md`:
- Line 226: Replace the blanket statement that "remote::ollama does not support
tool calling" with a clarification that remote::ollama (OGX/Llama Stack
integration) can support OpenAI-compatible tool calling via the standard
tools/tool_calls interface, but tool-calling or RAG-as-a-tool may fail depending
on the specific Ollama model's chat-template and tool wiring (e.g., custom or
nonstandard models); update the sentence in docs/rag_guide.md to reflect that
support is conditional on the model's chat-template/tool configuration rather
than absent altogether.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1d83c5af-58d1-4eb0-875e-30d4502b8b32

📥 Commits

Reviewing files that changed from the base of the PR and between 6738771 and ac0b354.

📒 Files selected for processing (3)
  • docs/byok_guide.md
  • docs/rag_guide.md
  • examples/lightspeed-stack-byok-okp-rag.yaml
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: Pylinter
  • GitHub Check: build-pr
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: src/client.py:104-108
Timestamp: 2026-05-20T08:09:36.724Z
Learning: In the lightspeed-stack repo, the synthesized `run.yaml` file handling in `src/client.py` (`_synthesize_library_config`) uses a fixed `/tmp` path intentionally in the PoC (PR `#1580`). The durable production requirements are tracked in spec doc R10 (docs/design/llama-stack-config-merge/llama-stack-config-merge.md): persistent known path overwritten each boot, file mode 0600 set via explicit create flag (not umask), and a `--synthesized-config-output` CLI flag for debugging. The PoC code is scheduled for removal pre-merge; the implementation JIRA "Unified llama_stack.config schema + synthesizer" inherits R10's requirements.
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.

Applied to files:

  • examples/lightspeed-stack-byok-okp-rag.yaml
🔇 Additional comments (2)
examples/lightspeed-stack-byok-okp-rag.yaml (1)

41-41: LGTM!

Also applies to: 48-48

docs/byok_guide.md (1)

164-164: ⚡ Quick win

Confirm uv for rag-content embedding download script.

The rag-content docs instruct running the download script with uv locally (e.g., uv run python ./scripts/download_embeddings_model.py -l ... -r sentence-transformers/all-mpnet-base-v2), matching the updated command in docs/byok_guide.md.

Comment thread docs/rag_guide.md

Tool calling with Ollama is not yet supported.
Currently, tool calling is not supported out of the box. Some experimental patches exist (including internal workarounds), but these are not officially released.
The `remote::ollama` provider does not support tool calling, so RAG as a tool is not available. However, inline RAG is supported.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Does Ollama support tool calling or function calling as of 2026?

💡 Result:

Yes. As of 2026-06-03, Ollama supports tool calling (also called function calling) in its chat API. Ollama’s official documentation states that “Ollama supports tool calling (also known as function calling)” and describes how to provide a tools list in the request, how the model returns tool calls, and how to send tool results back for the model to complete its reply [1]. The API documentation in Ollama’s repo likewise documents a tools parameter (with returned tool_calls) in chat requests [2], and Ollama’s blog post explains enabling tool calling by providing tools and that supported models return tool_calls [3]. So, for an up-to-date 2026 answer: Ollama has built-in tool/function calling capability (for models that support it), via the standard tools/tool_calls mechanism in the chat endpoints [1][2][3].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# List files that mention remote::ollama
rg -n "remote::ollama" -S . || true

# Find likely provider files (ts/js/python/etc) mentioning "ollama" and "provider"
rg -n "ollama" -S . | head -n 200 || true

# If there is a providers directory, narrow search
if [ -d "src" ]; then
  rg -n "remote::ollama|ollama.*provider|provider.*ollama" -S src || true
fi
if [ -d "packages" ]; then
  rg -n "remote::ollama|ollama.*provider|provider.*ollama" -S packages || true
fi

# Show the docs line context around docs/rag_guide.md:226
if [ -f "docs/rag_guide.md" ]; then
  sed -n '210,245p' docs/rag_guide.md | nl -ba | sed -n '1,60p'
fi

Repository: lightspeed-core/lightspeed-stack

Length of output: 2896


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show docs/providers.md around the ollama row
if [ -f "docs/providers.md" ]; then
  sed -n '1,120p' docs/providers.md | nl -ba | sed -n '1,120p'
else
  echo "Missing docs/providers.md"
fi

# Find files mentioning remote::ollama
rg -n "remote::ollama" -S . || true

# Find where provider_id="ollama" is used (docs/openapi.md likely, but also code)
rg -n "provider_id=\"ollama\"|provider_id='ollama'|provider.*ollama|ollama.*provider" -S . || true

# Search for tool calling plumbing in the codebase
rg -n "tool(s)?_call(s)?|tool_calls|tools\\b|function calling|function_calling" -S . || true

# Search for any ollama-specific client code that may build tool payloads
rg -n "ollama" -S src packages . | head -n 200 || true

Repository: lightspeed-core/lightspeed-stack

Length of output: 119


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect docs/providers.md around the ollama row (avoid nl)
if [ -f "docs/providers.md" ]; then
  # Print first 120 lines (likely includes table header)
  sed -n '1,140p' docs/providers.md
  echo "----"
  # Print around where 'ollama | remote' appears
  # Use line numbers via grep -n (supported) but avoid nl
  line=$(grep -n "ollama |" docs/providers.md | head -n 1 | cut -d: -f1 || true)
  if [ -n "${line:-}" ]; then
    start=$(( line>20 ? line-20 : 1 ))
    end=$(( line+40 ))
    sed -n "${start},${end}p" docs/providers.md
  else
    # fallback: print a larger section
    sed -n '1,240p' docs/providers.md
  fi
else
  echo "Missing docs/providers.md"
fi

echo "----"

# Find files that mention remote::ollama
rg -n "remote::ollama" -S . || true

echo "----"

# Find tool-calling related plumbing
rg -n "tool_calls|tool-call|function_call|function calling|\\btools\\b" -S . || true

echo "----"

# Find ollama provider code if present
rg -n "provider.*ollama|ollama.*provider|provider_id[=:\"']ollama" -S . || true

Repository: lightspeed-core/lightspeed-stack

Length of output: 50392


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- rag_guide context ----"
sed -n '215,240p' docs/rag_guide.md | cat

echo "---- search for remote::ollama ----"
rg -n "remote::ollama" -S . --glob '!tests/**' || true

echo "---- search for ollama + tool calling indicators (exclude tests) ----"
rg -n "(ollama).*(tool calling|tool-calling|tool_calls|function calling|function_calling|\\btools\\b)" -S . --glob '!tests/**' || true

echo "---- search for tool calling indicators near ollama (exclude tests) ----"
rg -n "(tool calling|tool-calling|tool_calls|function calling|function_calling|\\btools\\b)" -S docs --glob '*ollama*' || true
rg -n "(ollama)" -S docs --glob '*.md' || true

echo "---- list likely source files mentioning ollama (exclude tests) ----"
rg -n "(ollama)" -S . --glob '!tests/**' --max-count 50 || true

Repository: lightspeed-core/lightspeed-stack

Length of output: 6547


🌐 Web query:

llama-stack remote::ollama tool calling support tool_calls

💡 Result:

Llama Stack does support tool calling when using the remote::ollama provider, via the OpenAI-compatible chat.completions interface’s tools/tool_calls fields—but support can depend on (a) the Ollama model’s chat template/tool-calling wiring and (b) Llama Stack version. 1) Whether tool calling is supported - Llama Stack’s Ollama integration is implemented as a remote provider spec with provider_type "remote::ollama" [1]. - Tool calling is expected to work through standard OpenAI-compatible chat completions (tools passed in request, model returns tool_calls in response) as shown in Llama Stack tests and related provider work (e.g., remote-vllm tool calling support landed, indicating the general tool-calling mechanism is wired end-to-end for remote providers) [2]. - For Ollama specifically, Ollama’s own docs confirm it supports tool calling using tool_calls in responses and a follow-up request that includes tool results [3]. 2) Known caveat: tools may not be passed through for non-“Llama” Ollama models / certain model template conversions - There is an explicit Llama Stack issue reporting that when using an Ollama model that is not detected as a Llama model (e.g., custom-converted models), the.Tools variable in the Ollama template is not filled with the tools passed to Llama Stack, i.e., tool calling can fail because the model never receives tool definitions [4]. 3) Practical implication for “tool_calls” support - If your Ollama model’s chat template expects a.Tools section and that section is correctly populated by the Llama Stack Ollama provider, you should see tool_calls come back in chat completion responses (matching Ollama’s documented tool_calls behavior) [3]. - If the model is not properly detected / its template path differs (the scenario described in the issue), tool definitions may not be injected, and tool_calls may not be produced (or may be unreliable) [4]. 4) What to check in your setup - Ensure you are using the Llama Stack API endpoint variant that carries OpenAI-style tools (chat.completions) and passes them to the remote::ollama provider [1]. - Verify your Ollama model supports tool calling in its template (Ollama documents the required tool_calls flow) [3]. - If you use a custom-converted or nonstandard Ollama model, be aware of the reported provider/template bug for non-Llama models where tools are not injected [4]. Bottom line: Yes—Llama Stack + remote::ollama is intended to support tool calling via tool_calls, and Ollama supports tool_calls. But tool calling can break for certain Ollama models/templates (not properly detected as Llama), so your results may vary depending on the model and Llama Stack version [4][3].

Citations:


🌐 Web query:

site:github.com/openai/llama-stack remote::ollama tools tool_calls

💡 Result:

  • The repo you’re trying to search (github.com/openai/llama-stack) doesn’t appear to be the active home anymore—“Llama Stack is now OGX” and the code lives at ogx-ai/ogx (GitHub shows the rename/announcement in the README). [1]
  • remote::ollama provider docs (OGX): configuration includes base_url and the sample uses OLLAMA_URL (defaulting to http://localhost:11434/v1). [2]
  • Providers index (OGX): shows where remote::ollama sits under Inference providers and links to it. [3]
  • Ollama tool calling (tool_calls) reference: Ollama documents tool calling and shows response.message.tool_calls[...] usage. [4]

Sources
[1] (github.com)
[2] (ogx-ai.github.io)
[3] (ogx-ai.github.io)
[4] (docs.ollama.com)

Citations:


🌐 Web query:

site:docs.ollama.com tool calling tools parameter tool_calls

💡 Result:

Tool calling is a capability in LLMs that allows the model to interact with external tools (such as functions, APIs, or databases) by requesting their execution during a conversation [1]. When a model determines that a tool is needed to answer a user's request, it generates a structured request—typically referred to as a tool call—instead of a standard text response [1]. In API implementations (such as the OpenAI-compatible API often used by tools like Ollama), this is facilitated through the tool_calls parameter [1][2]. How tool_calls works: 1. Tool Definition: The user provides a list of available tools (often using a JSON schema format) in the request to the model [1][3]. 2. Model Request: If the model decides to use a tool, it returns a response containing a tool_calls array [1]. Each object in this array includes details about the requested function, such as: - Name: The identifier of the function to execute [1]. - Arguments: The parameters the model has determined are necessary for the function [1]. 3. Execution and Response: The application code executes the specified function using the provided arguments [1]. After execution, the application sends the result back to the model by adding a new message to the conversation history with the role of tool, including the tool_name and the content (the function's output) [1]. 4. Final Answer: The model then processes this result and generates a final response for the user [1]. This process supports parallel tool calling, where multiple tool requests can be generated in a single turn, and multi-turn loops where the model continues to call tools until it has sufficient information to provide a final answer [1]. When using streaming, tool calls are emitted as partial chunks, which the client must accumulate to fully reconstruct the tool call before execution [1][4].

Citations:


Fix misleading remote::ollama “no tool calling” documentation

docs/rag_guide.md currently claims remote::ollama “does not support tool calling,” but Ollama (and the OGX/Llama Stack remote::ollama integration) supports tool calling via OpenAI-compatible tools/tool_calls. Tool-calling/RAG-as-a-tool can still fail depending on the specific Ollama model’s chat-template/tool wiring (e.g., custom/nonstandard models), so the docs should describe this dependency instead of stating blanket non-support.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/rag_guide.md` at line 226, Replace the blanket statement that
"remote::ollama does not support tool calling" with a clarification that
remote::ollama (OGX/Llama Stack integration) can support OpenAI-compatible tool
calling via the standard tools/tool_calls interface, but tool-calling or
RAG-as-a-tool may fail depending on the specific Ollama model's chat-template
and tool wiring (e.g., custom or nonstandard models); update the sentence in
docs/rag_guide.md to reflect that support is conditional on the model's
chat-template/tool configuration rather than absent altogether.

@are-ces are-ces merged commit f1c6cec into lightspeed-core:main Jun 3, 2026
30 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants